AITopics | Conakry Region

Collaborating Authors

Conakry Region

Machine Translation for Nko: Tools, Corpora and Baseline Results

Doumbouya, Moussa Koulako Bala, Diané, Baba Mamadi, Cissé, Solo Farabado, Diané, Djibrila, Sow, Abdoulaye, Doumbouya, Séré Moussa, Bangoura, Daouda, Bayo, Fodé Moriba, Condé, Ibrahima Sory 2., Diané, Kalo Mory, Piech, Chris, Manning, Christopher

arXiv.org Artificial IntelligenceNov-15-2023

Unfortunately, to over 40 million people across West African countries date, there isn't any usable machine translation including Mali, Guinea, Ivory Coast, Gambia, (MT) system for Nko, in part due to the unavailability Burkina Faso, Sierra Leone, Senegal, Liberia, and of large text corpora required by state-of-the-art Guinea-Bissau. Nko, which means'I say' in all neural machine translation (NMT) algorithms. Manding languages, was developed as both the Nko is a representative case study of the broader Manding literary standard language and a writing issues that interfere with the goal of universal machine system by Soulemana Kanté in 1949 for the translation. Thousands of languages still purpose of sustaining the strong oral tradition of don't have available or usable MT systems, mainly Manding languages (Niane, 1974; Conde, 2017; due to the unavailability of high-quality parallel Eberhard et al., 2023).

dataset, translation, translator, (15 more...)

arXiv.org Artificial Intelligence

2310.15612

Country:

Africa > The Gambia (0.24)
Africa > Sierra Leone (0.24)
Africa > Senegal (0.24)
(20 more...)

Genre:

Research Report (0.64)
Questionnaire & Opinion Survey (0.46)

Industry: Education (0.92)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Naggita, Keziah, LaChance, Julienne, Xiang, Alice

arXiv.org Artificial IntelligenceAug-16-2023

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.

artificial intelligence, geotagged image, social media, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3600211.3604659

2308.08656

Country:

Asia > Brunei (0.14)
North America > Canada > Quebec > Montreal (0.06)
Africa > Sierra Leone (0.06)
(142 more...)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine (0.92)
Information Technology > Services (0.75)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Modelling spatio-temporal trends of air pollution in Africa

Gahungu, Paterne, Kubwimana, Jean Remy, Muhimpundu, Lionel Jean Marie Benjamin, Ndamuzi, Egide

arXiv.org Artificial IntelligenceAug-21-2022

Atmospheric pollution remains one of the major public health threat worldwide with an estimated 7 millions deaths annually. In Africa, rapid urbanization and poor transport infrastructure are worsening the problem. In this paper, we have analysed spatio-temporal variations of PM2.5 across different geographical regions in Africa. The West African region remains the most affected by the high levels of pollution with a daily average of 40.856 $\mu g/m^3$ in some cities like Lagos, Abuja and Bamako. In East Africa, Uganda is reporting the highest pollution level with a daily average concentration of 56.14 $\mu g/m^3$ and 38.65 $\mu g/m^3$ for Kigali. In countries located in the central region of Africa, the highest daily average concentration of PM2.5 of 90.075 $\mu g/m^3$ was recorded in N'Djamena. We compare three data driven models in predicting future trends of pollution levels. Neural network is outperforming Gaussian processes and ARIMA models.

concentration, pm2, prediction, (16 more...)

arXiv.org Artificial Intelligence

2208.12719

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.30)
Africa > Mali > Bamako > Bamako (0.26)
Africa > Chad > Chari-Baguirmi > N'Djamena (0.26)
(29 more...)

Genre: Research Report (0.64)

Industry:

Energy (0.93)
Law > Environmental Law (0.86)
Health & Medicine > Public Health (0.68)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users

Doumbouya, Moussa, Einstein, Lisa, Piech, Chris

arXiv.org Artificial IntelligenceApr-27-2021

For many of the 700 million illiterate people around the world, speech recognition technology could provide a bridge to valuable information and services. Yet, those most in need of this technology are often the most underserved by it. In many countries, illiterate people tend to speak only low-resource languages, for which the datasets necessary for speech technology development are scarce. In this paper, we investigate the effectiveness of unsupervised speech representation learning on noisy radio broadcasting archives, which are abundant even in low-resource languages. We make three core contributions. First, we release two datasets to the research community. The first, West African Radio Corpus, contains 142 hours of audio in more than 10 languages with a labeled validation subset. The second, West African Virtual Assistant Speech Recognition Corpus, consists of 10K labeled audio clips in four languages. Next, we share West African wav2vec, a speech encoder trained on the noisy radio corpus, and compare it with the baseline Facebook speech encoder trained on six times more data of higher quality. We show that West African wav2vec performs similarly to the baseline on a multilingual speech recognition task, and significantly outperforms the baseline on a West African language identification task. Finally, we share the first-ever speech recognition models for Maninka, Pular and Susu, languages spoken by a combined 10 million people in over seven countries, including six where the majority of the adult population is illiterate. Our contributions offer a path forward for ethical AI research to serve the needs of those most disadvantaged by the digital divide.

baseline wav2vec, representation, speech recognition, (12 more...)

arXiv.org Artificial Intelligence

2104.13083

Country:

North America > United States > New York > New York County > New York City (0.05)
Africa > Niger (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Radio (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

The Internet of the Orals

Communications of the ACMOct-24-2019, 23:40:57 GMT

Internet services like social media, online discussion forums, and crowdsourcing marketplaces have transformed how people participate in the information ecology and digital economy. These services empower mostly urban, affluent, and literate people, and improve their reach to information and instrumental needs. However, these services currently exclude billions of people worldwide who are too poor to afford Internet-enabled devices, too remote to access the Internet, or too low literate to navigate the mostly text-driven Internet. In India and Pakistan alone, there are nearly 1.1 billion people offline. Although 70% of their populations have access to mobile phones, most people still use basic or feature phones, making it difficult to extend existing Internet services on these devices running custom operating systems.

information, proceedings, voice forum, (16 more...)

Communications of the ACM

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
North America > Canada > Quebec > Montreal (0.05)
Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.05)
(11 more...)

Industry:

Health & Medicine > Public Health (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.30)
Health & Medicine > Therapeutic Area > Immunology (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Communications > Social Media > Crowdsourcing (0.35)

Add feedback